Skip to content

Conversation

ColtAllen
Copy link
Collaborator

@ColtAllen ColtAllen commented Aug 4, 2025

Description

@williambdean also deserves contributor credit for starting this PR.

ModelBuilder was originally developed to build models with an API very similar to the supervised learning models in scikit-learn. This works fine for MMMs, but creates considerable tech debt when different APIs are required. This PR is the first of several to clean up accumulated tech debt by dividing ModelBuilder into three separate classes:

Click to toggle UML diagram
classDiagram

class ModelIO {

- str _model_type

+ str version

+ az.InferenceData | None idata

+ dict sampler_config

+ dict model_config

+ id(self) str

- _serializable_model_config(self) dict[str, int | float | dict]

+ create_idata_attrs(self) dict[str, str]

+ set_idata_attrs(self, idata) az.InferenceData

+ save(self, fname, **kwargs) None

- @classmethod _model_config_formatting(cls, model_config) dict

+ @classmethod attrs_to_init_kwargs(cls, attrs) dict[str, Any]

+ @classmethod idata_to_init_kwargs(cls, idata) dict[str, Any]

+ build_from_idata(self, idata) None

+ @classmethod load(cls, fname, check)

+ @classmethod load_from_idata(cls, idata, check) "ModelIO"

}

  

class ModelBuilder {

+ str _model_type

+ str version

- __init__(self, model_config, sampler_config) None

+ default_model_config(self) dict

+ default_sampler_config(self) dict

+ build_model(self, **kwargs) None

+ fit(self, **kwargs) None

+ graphviz(self, **kwargs)

+ table(self, **model_table_kwargs) Table

+ fit_result(self) xr.Dataset

+ fit_result(self, res) None

+ prior

+ prior_predictive

+ posterior

+ posterior_predictive

+ predictions

}

  

class RegressionModelBuilder {

- _validate_data(self, X, y)

- _data_setter(self, X, y) None

+ output_var(self) str

+ build_model(self, X, y, **kwargs) None

+ build_from_idata(self, idata) None

+ create_fit_data(self, X, y) xr.Dataset

+ post_sample_model_transformation(self) None

+ fit(self, X, y, progressbar, random_seed, **kwargs) az.InferenceData

+ predict(self, X, extend_idata, **kwargs) np.ndarray

+ sample_prior_predictive(self, X, y, samples, extend_idata, combined, **kwargs)

+ sample_posterior_predictive(self, X, extend_idata, combined, **sample_posterior_predictive_kwargs)

+ predict_proba(self, X, extend_idata, combined, **kwargs) xr.DataArray

+ predict_posterior(self, X, extend_idata, combined, **kwargs) xr.DataArray

}

ModelBuilder --|> `abc.ABC`  

ModelBuilder --|> ModelIO

RegressionModelBuilder --|> ModelBuilder

Loading

The CLV models still inherit from a stripped-down ModelBuilder class, but all other models now inherit from RegressionModelBuilder. If the MNLogit and Nested Logit models are modified to inherit from ModelBuilder instead, the internals of both could be cleaned up quite a bit.

Related Issue

Checklist


📚 Documentation preview 📚: https://pymc-marketing--1870.org.readthedocs.build/en/1870/

@ColtAllen ColtAllen added this to the 0.16.0 milestone Aug 4, 2025
@ColtAllen ColtAllen requested a review from williambdean August 4, 2025 23:29
@ColtAllen ColtAllen added the model components Related to the various model components label Aug 4, 2025
@github-actions github-actions bot added CLV MMM customer choice Related to customer choice module labels Aug 4, 2025
@ColtAllen
Copy link
Collaborator Author

@williambdean do you still think the check parameter for load is necessary?

Copy link

codecov bot commented Aug 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.15%. Comparing base (32ed4db) to head (f7295af).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1870      +/-   ##
==========================================
+ Coverage   91.89%   92.15%   +0.26%     
==========================================
  Files          64       64              
  Lines        7577     7587      +10     
==========================================
+ Hits         6963     6992      +29     
+ Misses        614      595      -19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@williambdean williambdean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through. Not many objections. What else is needed here?

@ColtAllen
Copy link
Collaborator Author

Looking through. Not many objections. What else is needed here?

Just your thoughts on including the check parameter for the load function. The only justification I see for including it is if backwards-compatibility with older library versions is desirable.

Testing coverage is 99% locally; not sure why the CodeCov bot is always lower than the pytest plugin.

@juanitorduz
Copy link
Collaborator

From my side looks great! @williambdean, anything we are missing?

@juanitorduz
Copy link
Collaborator

As there are no comments I suggest we merge this on :)

Amazing work @ColtAllen !

@juanitorduz juanitorduz enabled auto-merge (squash) August 20, 2025 07:47
@juanitorduz juanitorduz merged commit ad84e90 into pymc-labs:main Aug 20, 2025
29 of 31 checks passed
@PabloRoque
Copy link
Contributor

It was a biggie, hadn't found the time, sorry 🙈

@juanitorduz
Copy link
Collaborator

It was a biggie, hadn't found the time, sorry 🙈

All good! Thank you @PabloRoque 💪

@ColtAllen ColtAllen deleted the break-down-modelbuilder branch August 20, 2025 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLV customer choice Related to customer choice module maintenance MMM model components Related to the various model components ModelBuilder Related to the ModelBuilder class and its children priority: high tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants